Skip to main content

Reply to this post | Go Back
View Post [edit]

Poster: Historyisimportant. Date: Apr 16, 2007 6:48am
Forum: faqs Subject: Understanding the IA "Crawl"

Dear IA,

The faq's and the forum do not answer my questions. I think many would benefit if you answer this question on the faq's and the forum.

I visit the IA about once a month to see if my site has been updated yet. It never is.

Each time I vist, the IA crawler comes to crawl my site. Please note the raw data log entries below. I would like to know if the crawler is crawling. There is a robot.txt file on my site, but it is empty (which I do believe means allowing all comers). A Techie at Network Solutions said that the IA crawler is different from all of the other crawlers (ex googlebot, msn, aol, etc.). I have seen how they crawl and they crawl through each page. Your IA crawler doesn't appear to be doing that, ever. It has come to visit many times, but never appears to "crawl through". Wouldn't you like to help me, after all of these months, and explain what is going on? You can answer this on the FAQ's, too, as many don't understand the IA (including me) and I have read through the forum and the FAQ's many times. See below the raw data log I am speaking about.



208.70.29.203 - - [13/Apr/2007:22:09:33 -0400] "GET /robots.txt HTTP/1.1" 200 25 "-" "ia_archiver-web.archive.org"

208.70.28.76 - - [13/Apr/2007:22:10:38 -0400] "GET /robots.txt HTTP/1.1" 200 25 "-" "ia_archiver-web.archive.org"

208.70.29.182 - - [13/Apr/2007:22:11:49 -0400] "GET /robots.txt HTTP/1.1" 200 25 "-" "ia_archiver-web.archive.org"

208.70.29.176 - - [13/Apr/2007:22:11:56 -0400] "GET /robots.txt HTTP/1.1" 200 25 "-" "ia_archiver-web.archive.org"

208.70.28.46 - - [13/Apr/2007:22:12:01 -0400] "GET /robots.txt HTTP/1.1" 200 25 "-" "ia_archiver-web.archive.org"

208.70.29.96 - - [13/Apr/2007:22:12:32 -0400] "GET /robots.txt HTTP/1.1" 200 25 "-" "ia_archiver-web.archive.org"

208.70.29.184 - - [13/Apr/2007:22:12:36 -0400] "GET /robots.txt HTTP/1.1" 200 25 "-" "ia_archiver-web.archive.org"

208.70.28.56 - - [13/Apr/2007:22:12:47 -0400] "GET /robots.txt HTTP/1.1" 200 25 "-" "ia_archiver-web.archive.org"

208.70.28.43 - - [13/Apr/2007:22:12:52 -0400] "GET /robots.txt HTTP/1.1" 200 25 "-" "ia_archiver-web.archive.org"

208.70.29.98 - - [13/Apr/2007:22:12:53 -0400] "GET /pages/1/index.htm HTTP/1.1" 200 118 "-" "ia_archiver-web.archive.org"

208.70.29.98 - - [13/Apr/2007:22:12:53 -0400] "GET /robots.txt HTTP/1.1" 200 25 "-" "ia_archiver-web.archive.org"

208.70.29.206 - - [13/Apr/2007:22:12:53 -0400] "GET /robots.txt HTTP/1.1" 200 25 "-" "ia_archiver-web.archive.org"

Thank you. I will post this on the forum.




Reply to this post
Reply [edit]

Poster: ARossi Date: Apr 17, 2007 12:12pm
Forum: faqs Subject: Re: Understanding the IA 'Crawl'

Hi,

I don't work with the web crawling team, but I would be willing to bet they'd have a better chance of answering this question if you told them the URL of your site.

Also, there's a forum for the web archive at http://www.archive.org/web/web.php#forum and they'd be more likely to see your query if you posted it there.

Thanks,

Alexis